Exome‐based new allele‐specific PCR markers and transferability for sodicity tolerance in bread wheat ( Triticum aestivum L.)

Abstract Targeted exome‐based genotype by sequencing (t‐GBS), a sequencing technology that tags SNPs and haplotypes in gene‐rich regions was used in previous genome‐wide association studies (GWAS) for sodicity tolerance in bread wheat. Thirty‐nine novel SNPs including 18 haplotypes for yield and yield‐components were identified. The present study aimed at developing SNP‐derived markers by precisely locating new SNPs on ~180 bp allelic sequence of t‐GBS, marker validation, and SNP functional characterization based on its exonic location. We identified unknown locations of significant SNPs/haplotypes by aligning allelic sequences on to IWGSC RefSeqv1.0 on respective chromosomes. Eighteen out of the target 39 SNP locations fulfilled the criteria for producing PCR markers, among which only eight produced polymorphic signals. These eight markers associated with yield, plants m−2, heads m−2, and harvest index, including a pleiotropic marker for yield, harvest index, and grains/head were validated for its amplification efficiency and phenotypic effects in focused identification germplasm strategy (FIGS) wheat set and a doubled haploid (DH) population (Scepter/IG107116). The phenotypic variation explained by these markers are in the range of 4.1–37.6 in the FIGS population. High throughput PCR‐based genotyping using new markers and association with phenotypes in FIGS wheat set and DH population validated the effect of functional SNP on closely associated genes—calcineurin B‐like‐ and dirigent protein, basic helix–loop–helix (BHLH‐), plant homeodomain (PHD‐) and helix–turn–helix myeloblastosis (HTH myb) type ‐transcription factor. Further, genome‐wide SNP annotation using SnpEff tool confirmed that these SNPs are in gene regulatory regions (upstream, 3′‐UTR, and intron) modifying gene expression and protein‐coding. This integrated approach of marker design for t‐GBS alleles, SNP functional annotation, and high‐throughput genotyping of functional SNP offers translation solutions across crops and complex traits in crop improvement programs.


| INTRODUCTION
Bread wheat is an allohexaploid with a very complex (>80% repetitive DNA) and large genome (approx. 17 Gb, 5Â the human genome and 40Â the rice genome). The three sub genomes (A, B, and D) have a high level of coding sequence similarity ($95%) between homoeologous genes (He et al., 2019;Huang et al., 2002;Krasileva et al., 2017;Zhou et al., 2020). The high sequence conservation between homoeologous genes coupled with the large genome size makes sequence amplification from a specific genome challenging. However, the suite of recent advances in genomics are now empowering geneticist to better understand tailored gene expressions. The whole genome sequencing technological advances have facilitated a more comprehensive view of diversity and gene function in plants. The availability of chromosome-based genome sequence of the common wheat (Alaux et al., 2018) and high-throughput genome sequencing technologies have significantly impacted the wheat genomics research and development landscape (Borrill et al., 2015;Jiao et al., 2011). The development of genotyping platforms such as 90 K iSelect SNP array has fast tracked the discovery and deployment of qualitative and quantitative economic traits in common wheat (Allen et al., 2017;Wang et al., 2014).
Genotyping-by-sequencing (GBS) platform enables highthroughput identification of new variants throughout the genome, contributing to the trait of interest (Deschamps et al., 2012;Ott et al., 2017). Recently, targeted exome-based genotype by sequencing (t-GBS) approach was implemented by He et al. (2019) for mapping targeted genes controlling major agronomic traits. The t-GBS platform generates low rates of missing data due to genome reduction and enables capture of heterozygous and exome-based alleles, which is crucial for quantitative trait improvements. In a previous investigation, the use of t-GBS platform has enabled tagging exome-based de novo SNPs and haplotypes within the diverse wheat lines contributing to yield and yield-components' tolerance on sodic-dispersive soils (Sharma et al., 2022). However, the disadvantage of t-GBS assay, unlike SNP chips, is that it fails to locate the SNPs on 180 bp allelic sequences to enable design of allele-specific SNP marker, which deters using the t-GBS functional alleles in practical breeding programs.
Allele-specific SNP markers offers high-throughput analysis, low genotyping error rates, detection of co-dominant inheritance, and high genomic abundance (Allen et al., 2017;Bhoite et al., 2021;Harper et al., 2012). While high throughput genotyping is now becoming an integral component of plant breeding, the conversion of trait-linked SNPs into allele-specific markers is challenging particularly in polyploid crop species such as wheat due to the homoeologous and paralogous genomic sequences (Adamski et al., 2020;Ling et al., 2013;Uauy, 2017). Therefore, it is essential to select variants of interest in each homoeolog separately for designing markers specific to the sequence. Advances in genomic resources and functional genomic tools have collectively enabled the understanding of the relationship between homoeologs and identification of homoeolog-specific variants, which can inform approaches to modulate the response of quantitative traits in polyploid wheat.
Soil sodicity is a complex subsoil constraint affecting all wheat developmental stages and causing loss in productivity (GRDC fact sheet, 2020;Sharma, 2017). Our previous genome-wide association studies (GWAS) demonstrated soil sodicity as a complex trait and reported number of SNPs and haplotypes contributing to improved yield and yield-components on sodic soils (Sharma et al., 2022). The objectives of this study were to (1) locate significant de novo SNPs/ haplotypes on $180 bp allelic sequence of t-GBS platform with respect to IWGSC RefSeqv1.0; (2) design allele-specific SNP/haplotype markers and check the amplification efficiency of alleles by PCR-based high throughput genotyping, contributing to sodicity tolerance in wheat (reported by Sharma et al., 2022); (3) annotate and characterize functional variants using SNP effect (SnpEff) pipeline; (4) validate marker effect in full focused identification germplasm strategy (FIGS) wheat set and biparental doubled haploid (DH) population (Scepter/IG107116) by association of high throughput genotype data and phenotype data.

| SNP validation and locating exome-based SNPs on t-GBS allelic sequences
The alignment of respective $180 bp allelic sequence (Table S1) Table S2). Out of 39 significant SNPs, seven SNP locations could not be mapped at the given location on reference sequence and therefore were removed from further analysis. The remaining 32 target variant sequence were used for the design of allele-specific marker. Among 32, only 18 target variant sequences successfully produced allele specific markers (Table 1).

| Genome-wide SNP annotation and characterization
The final variant calling output file (VCF) file after quality filtering resulted in 25,448 variants. The variants were annotated into F I G U R E 1 Targeted exome-based genotype by sequencing (t-GBS) and allele-specific marker design. (a) Custom wheat exome t-GBS assay with target and de novo SNPs within 180 bp sequence gap-filled allelic sequence between the probes. (b) Genotype calls for SNPs and haplotypes. (c) Design of allele specific markers for de novo SNPs and haplotypes on a stretch of 180 bp gap-filled exome-target sequence.
T A B L E 1 Significant SNPs with its favorable allele, allele frequency, percentage phenotypic variation, and variant consequence (source: Sharma et al., 2022).   Table S3. The largest number of SNPs are in intergenic regions (54%), followed by upstream (11.1%) and downstream genes (10.5%), and intronic regions (7.7%). Synonymous and missense accounted for 5.6% and 6.7%, respectively, followed by 3 0 UTR (2.1%) and 5 0 UTR (.7%). The significant SNPs/haplotypes contributing to yield and yield-components on sodic-dispersive soils (Sharma et al., 2022) were further selected for detailed investigation. Most of the SNPs associated with sodic-dispersive tolerance are identified in regulatory regions (upstream, 3 0 -UTR and intron) with few missense mutations (Tables 1 and 5). The successfully amplified markers for yield, plants m À2 , heads m À2 , and harvest index are presented in Table 5, with its impact type, variant consequence, associated genes, and transcription factors (TFs).

| Allele-specific PCR markers and polymorphic clusters
FIGS wheat set (192) was genotyped with 18 allele-specific markers (Table S4). The allele discrimination plots for full FIGS set are presented in Figure

| High throughput genotyping of functional SNPs
Genotyping full FIGS set and DH population using new SNP markers ( SNPs are presented in Table 5. The box plots in Figure 4 represents phenotypic variations for all successfully amplified markers at a given SNP location associated with the genes stated above. The favorable homozygous allelic effect was presented in the first box followed by unfavorable allelic effect in the second box to promote uniform visual- is revealed that most of the genes have the highest expression at critical wheat growth stages (germination, stem elongation, anthesis, and grain development) (Table 5) (Table S1). Therefore, in the present investigation, we identified the allelic location by aligning t-GBS allelic sequence on to the reference chromosomal sequence (IWGSC RefSeq v1.0; (Alaux et al., 2018; Table S2). The SNP markers were then designed around the allelic location contributing to yield and yield-components tolerance on sodic soils. Some of the SNP location did not support marker design due to failed optimal GC content, high hair-pin loop stability, marker in a repetitive sequence, unacceptable product size, nonspecific forward chromosome, and many homoeologs (Grewal et al., 2020;Ramirez-Gonzalez et al., 2015).   (Figure 3).

| Functional genomics and SNP characterization
Functional SNPs for significant wheat developmental stages are among major tools to ascertain the presence of desirable alleles in plant breeding to enhance complex traits (Cingolani et al., 2012;Sarkar & Maranas, 2020) like sodicity tolerance. The variant annotation and functional characterization have identified the location of variants (SNPs/haplotypes) in exome and impact on protein coding (Table 5). SNPs identified in gene regulatory and intronic regions (BHLH-, PHD-, and HTH myb type TF) in this study for Plants m À2 , Heads m À2 , and harvest index encode TFs, known to improve abiotic stress tolerance without a yield penalty (Mickelbart et al., 2015). SNPs identified in regulatory regions have significant roles in the activation of effector molecules directly involved in mitigating stress (Cingolani et al., 2012;Cummings et al., 2019;Vij & Tyagi, 2007;Watanabe et al., 2019;Yamaguchi-Shinozaki & Shinozaki, 2006;Yang & Wang, 2015;Yao et al., 2014). These functional SNPs could be used as targets for molecular breeding and high-throughput identification of target mutations in wheat improvement programs. SNPs having missense mutations and in regulatory regions (3 0 -UTR, upsteam and downstream; Table 1) could be further investigated for designing targets for gene editing to enhance complex traits.

| High-throughput genotyping using new exome-based PCR markers
The high through-put genotyping using functional SNP markers (closely associated with genes) in full FIGS set and correlations with phenotypic values (Figure 4, Table 3) validated genes encoding calcineurin B-like (Borjigin et al., 2021;Gao et al., 2015) and dirigent-protein (Khan et al., 2018;Li et al., 2017) and TFs-BHLH, PHD-type, HTH myb (Table 5) (Bhoite et al., 2021;Mickelbart et al., 2015;Roy, 2016). The exome-based alleles/SNPs are present in gene-rich regions, regulating the function of candidate genes; therefore, the positive effect of candidate genes associated with favorable allele/SNP for trait expression was also confirmed. It is well known that functional gene validation through molecular techniques such as gene cloning and knocking out of genes are required to confirm the effect of genes on trait expression. However, in the case of complex quantitative trait like soil sodicity, recording phenotypic effect of multiple minor-modifier genes throughout the wheat developmental stages has been a challenging task with no expression of distinct phenotypes. Also, it is difficult to imitate complex subsoil constraints in pot studies. Using exome-based functional SNPs is paramount to investigate genetics of complex quantitative traits. Therefore, in our previous GWAS on sodicity tolerance (Sharma et al., 2022), we used exome-based alleles, which is directly/indirectly involved in gene expression and protein coding.
Sodic-dispersive soils have high levels of sodium ions (Na + ) and clay. The negatively charged clay particles attract positively charged Na + ions, forming a massive duplex and dispersive soil, with higher pH (>8) (Arif et al., 2020;Orton et al., 2018). As a result, wheat development and root patterning are disturbed, and acclimation response depends on redox and reactive oxygen species signaling, calcium, and plant hormones. Calcineurin B-like protein is a Ca 2+ ion sensor that increase intracellular Ca +2 ion concentration, channel K + ions, and T A B L E 5 Successfully amplified allele-specific SNPs and haplotypes with its impact type on genes and variant consequence. sequester excess Na + ions into vacuolar space (Borjigin et al., 2021;Gao et al., 2015). This corroborative effort improves K + /Na + equilibrium and cellular ionic homeostasis. The dirigent proteins predominantly express in stems and have roles in biosynthesis of lignin-like molecules, secondary metabolism, and fiber synthesis. This metabolic plasticity is a defense response against abiotic stresses (Khan et al., 2018;Li et al., 2017). The TFs (BHLH domain, PHD-, and HTH myb-type) play a critical role in plant growth and development and are significantly involved in several abiotic stress tolerance mechanisms and reactive oxygen species signaling. These TFs are usually coexpressed to enhance grain size and yield under abiotic stress in cereal crops (Bhoite et al., 2021;Roy, 2016). The reported genes are highly expressed during germination and anthesis, the major target constraint stage for sodicity (Sharma et al., 2022). Therefore, the reported genes are best candidates for wheat improvement on sodic soils.
Quantitative traits are governed by many minor genes and deciphering gene function based on gene-by-gene functional validation is arduous and impractical (Mackay, 2001;MacKay et al., 2009) (Table S1). The possible mutations in allelic sequences that improve the trait value were considered as potential high LD haplotypes (Bhat et al., 2021;Stram, 2017) ( Figure 1C). To design a marker, the precise location of SNPs/haplotypes should be known with respect to the reference sequence. Therefore, each allelic sequence identified for a trait was aligned on to IWGSC RefSeqv1.0 on respective chromosome as presented in Table S2. IWGSC RefSeqv1.0 sequence for respective chromosome ( Figure 1) were retrieved from URGI (https://wheat-urgi. versailles.inra.fr/Seq-Repository) for Chinese Spring v1.0 (Alaux et al., 2018). The parameters, alignment coverage, and identities greater than 99% were used for defining a valid blast hit on chromosomes specific to GWAS signal. Applied Biosystems ® (ViiA 7). Allele discrimination and florescent signals of SNP markers were assessed in the QuantStudio 5 software.

| Design of exome-specific PCR markers
The allele calls with quality score >95 was considered for highthroughput gene validation. The FIGS lines were genotyped using new gene-associated functional markers, and favorable and alternative alleles were grouped separately. The genotypes and phenotypes were correlated to elucidate the effect of gene-specific favorable allele on phenotypic expression. The polymorphic markers between Scepter and IG107116 were tested for marker effect in DH lines.
Lines were grouped based on allelic calls (favorable and alternative alleles) and Student's t-test was applied to test the significance of phenotypic differences between the groups.

| Exome-based SNP annotation and function
The VCF from the previous GWAS study was filtered to extract highquality variants following parameters: minor allele frequency >.05% and genotype information >80%. The final VCF file was subjected to  (Cummings et al., 2019;Vij & Tyagi, 2007;Yang & Wang, 2015). Australia, and the Grain Research and Development Corporation (GRDC). Genotyping data were generated through DAV00127, a collaborative project involving DJPR (Victoria) and CSIRO, funded by GRDC.

CONFLICT OF INTEREST STATEMENT
The Authors did not report any conflict of interest.

PEER REVIEW
The peer review history for this article is available in the supporting information for this article.

DATA AVAILABILITY STATEMENT
Genotype by sequencing data and significant allelic sequences for yield and yield-components are presented in supplementary tables.